5 research outputs found
Membership Inference Attacks against Language Models via Neighbourhood Comparison
Membership Inference attacks (MIAs) aim to predict whether a data sample was
present in the training data of a machine learning model or not, and are widely
used for assessing the privacy risks of language models. Most existing attacks
rely on the observation that models tend to assign higher probabilities to
their training samples than non-training points. However, simple thresholding
of the model score in isolation tends to lead to high false-positive rates as
it does not account for the intrinsic complexity of a sample. Recent work has
demonstrated that reference-based attacks which compare model scores to those
obtained from a reference model trained on similar data can substantially
improve the performance of MIAs. However, in order to train reference models,
attacks of this kind make the strong and arguably unrealistic assumption that
an adversary has access to samples closely resembling the original training
data. Therefore, we investigate their performance in more realistic scenarios
and find that they are highly fragile in relation to the data distribution used
to train reference models. To investigate whether this fragility provides a
layer of safety, we propose and evaluate neighbourhood attacks, which compare
model scores for a given sample to scores of synthetically generated neighbour
texts and therefore eliminate the need for access to the training data
distribution. We show that, in addition to being competitive with
reference-based attacks that have perfect knowledge about the training data
distribution, our attack clearly outperforms existing reference-free attacks as
well as reference-based attacks with imperfect knowledge, which demonstrates
the need for a reevaluation of the threat model of adversarial attacks
Unique Identification of 50,000+ Virtual Reality Users from Head & Hand Motion Data
With the recent explosive growth of interest and investment in virtual
reality (VR) and the so-called "metaverse," public attention has rightly
shifted toward the unique security and privacy threats that these platforms may
pose. While it has long been known that people reveal information about
themselves via their motion, the extent to which this makes an individual
globally identifiable within virtual reality has not yet been widely
understood. In this study, we show that a large number of real VR users
(N=55,541) can be uniquely and reliably identified across multiple sessions
using just their head and hand motion relative to virtual objects. After
training a classification model on 5 minutes of data per person, a user can be
uniquely identified amongst the entire pool of 50,000+ with 94.33% accuracy
from 100 seconds of motion, and with 73.20% accuracy from just 10 seconds of
motion. This work is the first to truly demonstrate the extent to which
biomechanics may serve as a unique identifier in VR, on par with widely used
biometrics such as facial or fingerprint recognition
Differentially Private Language Models for Secure Data Sharing
To protect the privacy of individuals whose data is being shared, it is of high importance to develop methods allowing researchers and companies to release textual data while providing formal privacy guarantees to its originators. In the field of NLP, substantial efforts have been directed at building mechanisms following the framework of local differential privacy, thereby anonymizing individual text samples before releasing them. In practice, these approaches are often dissatisfying in terms of the quality of their output language due to the strong noise required for local differential privacy. In this paper, we approach the problem at hand using global differential privacy, particularly by training a generative language model in a differentially private manner and consequently sampling data from it. Using natural language prompts and a new prompt-mismatch loss, we are able to create highly accurate and fluent textual datasets taking on specific desired attributes such as sentiment or topic and resembling statistical properties of the training data. We perform thorough experiments indicating that our synthetic datasets do not leak information from our original data and are of high language quality and highly suitable for training models for further analysis on real-world data. Notably, we also demonstrate that training classifiers on private synthetic data outperforms directly training classifiers with DP-SGD
Smaller Language Models are Better Black-box Machine-Generated Text Detectors
With the advent of fluent generative language models that can produce
convincing utterances very similar to those written by humans, distinguishing
whether a piece of text is machine-generated or human-written becomes more
challenging and more important, as such models could be used to spread
misinformation, fake news, fake reviews and to mimic certain authors and
figures. To this end, there have been a slew of methods proposed to detect
machine-generated text. Most of these methods need access to the logits of the
target model or need the ability to sample from the target. One such black-box
detection method relies on the observation that generated text is locally
optimal under the likelihood function of the generator, while human-written
text is not. We find that overall, smaller and partially-trained models are
better universal text detectors: they can more precisely detect text generated
from both small and larger models. Interestingly, we find that whether the
detector and generator were trained on the same data is not critically
important to the detection success. For instance the OPT-125M model has an AUC
of 0.81 in detecting ChatGPT generations, whereas a larger model from the GPT
family, GPTJ-6B, has AUC of 0.45